Text Deblurring Using OCR Word Confidence
نویسندگان
چکیده
Objective of this paper is to propose a new Deblurring method for motion blurred textual images. This technique is based on estimating the blur kernel or the Point Spread Function of the motion blur using Blind Deconvolution method. Motion blur is either due to the movement of the camera or the object at the time of image capture. The point spread function of the motion blur is governed by two parameters length of the motion and the angle of the motion. In this approach we have estimated point spread function for the motion blur iteratively for different values of the length and angle of motion. For every estimated PSF we perform the Deconvolution operation with the blurred image to get the nonblurred or the latent image. Latent image obtained is then feed to an Optical character recognition so that the text in that image can be recognized. Then we calculate the Average Word Confidence for the recognized text. Thus for every estimated Point Spread Function and the obtained latent image we get the value of Average Word Confidence. The Point Spread Function with the highest Average Word Confidence value is the optimal Point Spread Function which can be used to deblur the given textual image. In this method we do not have any prior information about the PSF and only single image is used as an input to the system. This method has been tested with the naturally blurred image taken manually and through the internet as well as artificially blurred image for the evaluation of the results. The implementation of the proposed algorithm has been done in MATLAB.
منابع مشابه
Combining multiple thresholding binarization values to improve OCR output
For noisy, historical documents, a high optical character recognition (OCR) word error rate (WER) can render the OCR text unusable. Since image binarization is often the method used to identify foreground pixels, a significant body of research has sought to improve image-wide binarization directly. Instead of relying on any one imperfect binarization technique, our method incorporates informati...
متن کاملWord Segmentation for Urdu OCR System
This paper presents a technique for Word segmentation for the Urdu OCR system. Word segmentation or word tokenization is a preliminary task for understanding the meanings of sentences in Urdu language processing. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A me...
متن کاملOCR Post-Processing Error Correction Algorithm Using Google's Online Spelling Suggestion
With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occa...
متن کاملOCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion
With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occa...
متن کاملOCR Error Correction Using Statistical Machine Translation
In this paper, we explore the use of a statistical machine translation system for optical character recognition (OCR) error correction. We investigate the use of word and character-level models to support a translation from OCR system output to correct french text. Our experiments show that character and word based machine translation correction make significant improvements to the quality of t...
متن کامل